PROCESS-2: A Benchmark Speech Corpus for Early Cognitive Impairment Detection

📄 Abstract - PROCESS-2: A Benchmark Speech Corpus for Early Cognitive Impairment Detection

Speech-based analysis offers a scalable and non-invasive approach for detecting cognitive decline, yet progress has been constrained by the limited availability of clinically validated datasets collected under realistic conditions. We introduce PROCESS-2, a large-scale speech dataset designed to support research on automatic assessment of cognitive impairment from spontaneous and task-oriented speech. The dataset comprises recordings from 200 healthy controls, 150 mild cognitive impairment, and 50 dementia diagnoses collected using the CognoMemory digital assessment platform. Each participant completed a single assessment session, including picture description and verbal fluency tasks, accompanied by manually verified transcripts and participant-level metadata. PROCESS-2 contains approximately 21 hours of speech audio with predefined train/test partitions. Comprehensive technical validation evaluated demographic balance, clinical consistency, recording stability, embedding-space structure, and reproducible baseline modelling performance, demonstrating clinically meaningful group separation and stable performance across modelling approaches while preserving real-world conversational variability. PROCESS-2 is released under controlled access via Hugging Face to enable responsible reuse while protecting participant privacy, providing a reproducible benchmark resource for speech-based cognitive assessment research.

PROCESS-2：用于早期认知障碍检测的基准语音语料库 / PROCESS-2: A Benchmark Speech Corpus for Early Cognitive Impairment Detection

1️⃣ 一句话总结

该论文介绍了一个大型语音数据集PROCESS-2，包含来自200名健康老年人、150名轻度认知障碍和50名痴呆症患者的21小时语音录音，通过图片描述和言语流畅性任务采集，旨在为基于语音的认知障碍自动检测提供可靠、可复现的基准资源。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要