Bias and Fairness in Self-Supervised Acoustic Representations for Cognitive Impairment Detection

📄 Abstract - Bias and Fairness in Self-Supervised Acoustic Representations for Cognitive Impairment Detection

Speech-based detection of cognitive impairment (CI) offers a promising non-invasive approach for early diagnosis, yet performance disparities across demographic and clinical subgroups remain underexplored, raising concerns around fairness and generalizability. This study presents a systematic bias analysis of acoustic-based CI and depression classification using the DementiaBank Pitt Corpus. We compare traditional acoustic features (MFCCs, eGeMAPS) with contextualized speech embeddings from Wav2Vec 2.0 (W2V2), and evaluate classification performance across gender, age, and depression-status subgroups. For CI detection, higher-layer W2V2 embeddings outperform baseline features (UAR up to 80.6\%), but exhibit performance disparities; specifically, females and younger participants demonstrate lower discriminative power (\(AUC\): 0.769 and 0.746, respectively) and substantial specificity disparities (\(\Delta_{spec}\) up to 18\% and 15\%, respectively), leading to a higher risk of misclassifications than their counterparts. These disparities reflect representational biases, defined as systematic differences in model performance across demographic or clinical subgroups. Depression detection within CI subjects yields lower overall performance, with mild improvements from low and mid-level W2V2 layers. Cross-task generalization between CI and depression classification is limited, indicating that each task depends on distinct representations. These findings emphasize the need for fairness-aware model evaluation and subgroup-specific analysis in clinical speech applications, particularly in light of demographic and clinical heterogeneity in real-world applications.

用于认知障碍检测的自监督声学表征中的偏见与公平性 / Bias and Fairness in Self-Supervised Acoustic Representations for Cognitive Impairment Detection

1️⃣ 一句话总结

这项研究发现，虽然自监督学习模型在通过语音检测认知障碍方面表现优异，但其性能在不同性别、年龄和抑郁状态的亚组中存在显著差异，可能导致对女性和年轻参与者的误判风险更高，因此强调了在临床语音应用中评估模型公平性和进行亚组分析的重要性。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要