菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-03
📄 Abstract - Audit After Segmentation: Reference-Free Mask Quality Assessment for Language-Referred Audio-Visual Segmentation

Language-referred audio-visual segmentation (Ref-AVS) aims to segment target objects described by natural language by jointly reasoning over video, audio, and text. Beyond generating segmentation masks, providing rich and interpretable diagnoses of mask quality remains largely underexplored. In this work, we introduce Mask Quality Assessment in the Ref-AVS context (MQA-RefAVS), a new task that evaluates the quality of candidate segmentation masks without relying on ground-truth annotations as references at inference time. Given audio-visual-language inputs and each provided segmentation mask, the task requires estimating its IoU with the unobserved ground truth, identifying the corresponding error type, and recommending an actionable quality-control decision. To support this task, we construct MQ-RAVSBench, a benchmark featuring diverse and representative mask error modes that span both geometric and semantic issues. We further propose MQ-Auditor, a multimodal large language model (MLLM)-based auditor that explicitly reasons over multimodal cues and mask information to produce quantitative and qualitative mask quality assessments. Extensive experiments demonstrate that MQ-Auditor outperforms strong open-source and commercial MLLMs and can be integrated with existing Ref-AVS systems to detect segmentation failures and support downstream segmentation improvement. Data and codes will be released at this https URL.

顶级标签: multi-modal model evaluation computer vision
详细标签: mask quality assessment audio-visual segmentation multimodal llm reference-free evaluation segmentation benchmark 或 搜索:

分割后审计:面向语言指代音视频分割的无参考掩码质量评估 / Audit After Segmentation: Reference-Free Mask Quality Assessment for Language-Referred Audio-Visual Segmentation


1️⃣ 一句话总结

这篇论文提出了一个名为MQA-RefAVS的新任务,旨在不依赖人工标注的情况下,自动评估音视频语言分割任务中生成的目标分割掩码的质量,包括预测其准确度、识别错误类型并提供改进建议,并为此构建了基准数据集和基于多模态大语言模型的评估工具MQ-Auditor。

源自 arXiv: 2602.03892