📄
Abstract - PrefSQA: Pairwise Preference Prediction for Speech Quality Assessment and the Critical Role of High Quality Datasets
Mean opinion scores (MOS) are widely used for speech quality assessment, yet scalar labels are sensitive to rater variability and listening test differences. This introduces labeling noise, which limits the reliability of MOS prediction. Preference prediction reduces this variability as listeners compare signals directly, producing cleaner labels. We study MOS-free preference prediction and propose PrefSQA, which incorporates uncertainty-aware logits, an impairment attention head, and a module based on non-matching-reference comparisons. We use and refine five datasets, including MOS-derived and low-noise simulated sets with matching and non-matching content, experiment with human preference sets, and test on unseen data. Experiments show small improvements on MOS-derived data, while other sets reveal clear improvement over the baselines, highlighting the value of high-quality preference data and demonstrating the effectiveness of the proposed method.
PrefSQA:用于语音质量评估的成对偏好预测及高质量数据集的关键作用 /
PrefSQA: Pairwise Preference Prediction for Speech Quality Assessment and the Critical Role of High Quality Datasets
1️⃣ 一句话总结
该论文提出了一种名为PrefSQA的语音质量评估方法,通过让听者直接比较两段语音的好坏生成更可靠的偏好标签,并设计了结合不确定性感知和注意力机制的模型,实验表明在高质量偏好数据上相比传统评分方法有显著提升。