菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-06-02
📄 Abstract - Speech Emotion Recognition using Attention-based LSTM-Network with Residual Connection

Speech emotion recognition is an important component of modern human-computer interaction systems. However, many state-of-the-art approaches rely on large pretrained models with high computational and memory requirements, limiting their applicability. This paper proposes ResLSTM-SA, a lightweight architecture that integrates residual connections with soft attention within an LSTM-based framework. Evaluated on the RAVDESS dataset under strict speaker-independent partitioning, the proposed model outperforms conventional attention-based LSTM baselines and several previously reported CNN- and hybrid CNN-LSTM architectures in terms of unweighted average recall (UAR). The best-performing variant (ResLSTM-SA-h64) achieves a maximum UAR of 0.6517 with only 46.8k trainable parameters, delivering competitive accuracy with three orders of magnitude fewer parameters than large-scale self-supervised alternatives, thereby enabling efficient deployment on edge devices and real-time voice assistants. The source code is available at this https URL.

顶级标签: audio machine learning model training
详细标签: speech emotion recognition lstm attention mechanism residual connections lightweight model 或 搜索:

基于注意力机制和残差连接的LSTM网络的语音情感识别 / Speech Emotion Recognition using Attention-based LSTM-Network with Residual Connection


1️⃣ 一句话总结

本文提出了一种轻量级语音情感识别模型ResLSTM-SA,通过将残差连接和软注意力机制融入LSTM网络,在保持高准确率的同时将参数量减少至大规模模型的千分之一以下,适合在手机、智能音箱等边缘设备上实时运行。

源自 arXiv: 2606.03359