Real-Time Multimodal Activity-Aware Error Detection in Robot-Assisted Surgery

📄 Abstract - Real-Time Multimodal Activity-Aware Error Detection in Robot-Assisted Surgery

Robot-assisted minimally invasive surgery improves surgical precision but introduces complexity, making technical error detection essential for ensuring patient safety. Current executional error detection methods using video data often overlook fine-grained contextual descriptions of activities and error types within the hierarchical structure of surgical procedures. They also under-utilize complementary multimodal information. We propose a unified framework for executional error detection that leverages multimodal input, including video, kinematics, and descriptive textual prompts. Through activity prompting, we integrate descriptive language in gesture-level activities, instrument-object interactions, and error definitions. We also introduce activity-aware visual embeddings derived from vision encoders pretrained on surgical activity labels to compare the effectiveness of contrastive language-image embeddings with traditional image-based embeddings for error detection. By seamlessly integrating kinematic data with video and textual modalities, our framework significantly improves error detection performance. Achieving up to 5\% and 16.6\% F1 score improvements over state-of-the-art baselines on the JIGSAWS and SAR-RARP50 datasets, respectively, we demonstrate the value of combining curated textual prompts with multimodal data for accurate error detection.

机器人辅助手术中的实时多模态活动感知错误检测 / Real-Time Multimodal Activity-Aware Error Detection in Robot-Assisted Surgery

1️⃣ 一句话总结

该论文提出了一种结合视频、运动数据和文字描述的统一框架，通过活动提示和视觉嵌入技术，显著提升了机器人辅助手术中技术错误检测的准确率，在两项公开数据集上分别将F1分数提升了5%和16.6%。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要