菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-09
📄 Abstract - PTS-SNN: A Prompt-Tuned Temporal Shift Spiking Neural Networks for Efficient Speech Emotion Recognition

Speech Emotion Recognition (SER) is widely deployed in Human-Computer Interaction, yet the high computational cost of conventional models hinders their implementation on resource-constrained edge devices. Spiking Neural Networks (SNNs) offer an energy-efficient alternative due to their event-driven nature; however, their integration with continuous Self-Supervised Learning (SSL) representations is fundamentally challenged by distribution mismatch, where high-dynamic-range embeddings degrade the information coding capacity of threshold-based neurons. To resolve this, we propose Prompt-Tuned Spiking Neural Networks (PTS-SNN), a parameter-efficient neuromorphic adaptation framework that aligns frozen SSL backbones with spiking dynamics. Specifically, we introduce a Temporal Shift Spiking Encoder to capture local temporal dependencies via parameter-free channel shifts, establishing a stable feature basis. To bridge the domain gap, we devise a Context-Aware Membrane Potential Calibration strategy. This mechanism leverages a Spiking Sparse Linear Attention module to aggregate global semantic context into learnable soft prompts, which dynamically regulate the bias voltages of Parametric Leaky Integrate-and-Fire (PLIF) neurons. This regulation effectively centers the heterogeneous input distribution within the responsive firing range, mitigating functional silence or saturation. Extensive experiments on five multilingual datasets (e.g., IEMOCAP, CASIA, EMODB) demonstrate that PTS-SNN achieves 73.34\% accuracy on IEMOCAP, comparable to competitive Artificial Neural Networks (ANNs), while requiring only 1.19M trainable parameters and 0.35 mJ inference energy per sample.

顶级标签: audio model training machine learning
详细标签: speech emotion recognition spiking neural networks efficient inference self-supervised learning parameter-efficient tuning 或 搜索:

PTS-SNN:一种用于高效语音情感识别的提示调优时序移位脉冲神经网络 / PTS-SNN: A Prompt-Tuned Temporal Shift Spiking Neural Networks for Efficient Speech Emotion Recognition


1️⃣ 一句话总结

这篇论文提出了一种名为PTS-SNN的新型高效神经网络模型,它通过创新的提示调优和时序移位技术,成功地将节能的脉冲神经网络与强大的预训练语音模型结合,从而在保持高精度的同时,大幅降低了语音情感识别任务的计算成本和能耗,使其更适合在手机等资源有限的设备上运行。

源自 arXiv: 2602.08240