菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-16
📄 Abstract - Spectrogram features for audio and speech analysis

Spectrogram-based representations have grown to dominate the feature space for deep learning audio analysis systems, and are often adopted for speech analysis also. Initially, the primary motivator for spectrogram-based representations was their ability to present sound as a two dimensional signal in the time-frequency plane, which not only provides an interpretable physical basis for analysing sound, but also unlocks the use of a wide range of machine learning techniques such as convolutional neural networks, that had been developed for image processing. A spectrogram is a matrix characterised by the resolution and span of its two dimensions, as well as by the representation and scaling of each element. Many possibilities for these three characteristics have been explored by researchers across numerous application areas, with different settings showing affinity for various tasks. This paper reviews the use of spectrogram-based representations and surveys the state-of-the-art to question how front-end feature representation choice allies with back-end classifier architecture for different tasks.

顶级标签: audio machine learning model training
详细标签: spectrogram feature engineering speech processing convolutional neural networks time-frequency analysis 或 搜索:

用于音频和语音分析的频谱图特征 / Spectrogram features for audio and speech analysis


1️⃣ 一句话总结

这篇论文综述了频谱图特征如何作为深度学习音频分析系统的核心前端表示,探讨了其参数设置与后端分类器架构在不同任务中的协同关系,旨在为研究者提供特征选择的指导。

源自 arXiv: 2603.14917