菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-06-25
📄 Abstract - Temporally Consistent Label Interpolation for Robust Surgical Multi-Task Learning under Challenging Conditions

Effective multi-task learning for surgical scene understanding is fundamentally hindered by annotation granularity mismatch; temporal workflow tasks such as phase recognition, step recognition and anticipation benefit from dense frame-level supervision, whereas pixel-level spatial tasks including instrument segmentation and action recognition are only sparsely annotated on selected keyframes due to prohibitive labeling costs. This supervision imbalance undermines shared representation learning and limits joint optimization across heterogeneous surgical tasks. To address this, we propose Flow-guided Annotation for Robust Operating Scenes (FAROS), a flow-guided label interpolation framework, that combines zero-shot segmentation-based mask propagation with optical flow estimation to overcome the limitations of appearance-based propagation under challenging surgical conditions such as occlusion, smoke, and motion blur, generating temporally consistent dense pseudo labels from sparse keyframe annotations. The densified instrument masks and action labels are integrated into a unified Transformer-based multi-task framework that jointly learns surgical phase recognition, step recognition, anticipation, instrument segmentation, and action recognition, enabling balanced optimization between dense temporal supervision and sparse spatial supervision. The label interpolation quality of FAROS is first validated on the DAVIS 2017 benchmark under a sparse ground-truth protocol, confirming robust propagation beyond the surgical domain. Extensive experiments on GraSP, MISAW, and AutoLaparo benchmarks further demonstrate that FAROS significantly improves cross-task representation learning and enhances holistic surgical scene understanding performance across spatio-temporal tasks.

顶级标签: computer vision medical
详细标签: multi-task learning surgical scene understanding label interpolation optical flow video segmentation 或 搜索:

基于时间一致标签插值的鲁棒手术多任务学习方法 / Temporally Consistent Label Interpolation for Robust Surgical Multi-Task Learning under Challenging Conditions


1️⃣ 一句话总结

该论文提出了一种名为FAROS的标签插值框架,通过光流估计和零样本分割技术,将稀疏的帧标注自动扩展为密集且时间一致的高质量伪标签,从而平衡手术视频中时间任务(如阶段识别)和空间任务(如器械分割)的标注差异,显著提升多任务学习的整体表现。

源自 arXiv: 2606.26634