无需训练的无标记手术器械检测与六维姿态估计 / Training-free Detection and 6D Pose Estimation of Unseen Surgical Instruments
1️⃣ 一句话总结
这篇论文提出了一种无需专门训练的新方法,仅凭器械的3D模型就能在手术场景中实时、高精度地检测并估算从未见过的手术器械的六维空间姿态,其性能媲美需要大量标注数据的传统监督学习方法。
Purpose: Accurate detection and 6D pose estimation of surgical instruments are crucial for many computer-assisted interventions. However, supervised methods lack flexibility for new or unseen tools and require extensive annotated data. This work introduces a training-free pipeline for accurate multi-view 6D pose estimation of unseen surgical instruments, which only requires a textured CAD model as prior knowledge. Methods: Our pipeline consists of two main stages. First, for detection, we generate object mask proposals in each view and score their similarity to rendered templates using a pre-trained feature extractor. Detections are matched across views, triangulated into 3D instance candidates, and filtered using multi-view geometric consistency. Second, for pose estimation, a set of pose hypotheses is iteratively refined and scored using feature-metric scores with cross-view attention. The best hypothesis undergoes a final refinement using a novel multi-view, occlusion-aware contour registration, which minimizes reprojection errors of unoccluded contour points. Results: The proposed method was rigorously evaluated on real-world surgical data from the MVPSP dataset. The method achieves millimeter-accurate pose estimates that are on par with supervised methods under controlled conditions, while maintaining full generalization to unseen instruments. These results demonstrate the feasibility of training-free, marker-less detection and tracking in surgical scenes, and highlight the unique challenges in surgical environments. Conclusion: We present a novel and flexible pipeline that effectively combines state-of-the-art foundational models, multi-view geometry, and contour-based refinement for high-accuracy 6D pose estimation of surgical instruments without task-specific training. This approach enables robust instrument tracking and scene understanding in dynamic clinical environments.
无需训练的无标记手术器械检测与六维姿态估计 / Training-free Detection and 6D Pose Estimation of Unseen Surgical Instruments
这篇论文提出了一种无需专门训练的新方法,仅凭器械的3D模型就能在手术场景中实时、高精度地检测并估算从未见过的手术器械的六维空间姿态,其性能媲美需要大量标注数据的传统监督学习方法。
源自 arXiv: 2603.25228