Words at Play: Benchmarking Audio Pun Understanding in Large Audio-Language Models

📄 Abstract - Words at Play: Benchmarking Audio Pun Understanding in Large Audio-Language Models

Puns represent a typical linguistic phenomenon that exploits polysemy and phonetic ambiguity to generate humour, posing unique challenges for natural language understanding. Within pun research, audio plays a central role in human communication except text and images, while datasets and systematic resources for spoken puns remain scarce, leaving this crucial modality largely underexplored. In this paper, we present APUN-Bench, the first benchmark dedicated to evaluating large audio language models (LALMs) on audio pun understanding. Our benchmark contains 4,434 audio samples annotated across three stages: pun recognition, pun word location and pun meaning inference. We conduct a deep analysis of APUN-Bench by systematically evaluating 10 state-of-the-art LALMs, uncovering substantial performance gaps in recognizing, localizing, and interpreting audio puns. This analysis reveals key challenges, such as positional biases in audio pun location and error cases in meaning inference, offering actionable insights for advancing humour-aware audio intelligence.

词语的趣味：大型音频-语言模型在音频双关语理解上的基准测试 / Words at Play: Benchmarking Audio Pun Understanding in Large Audio-Language Models

1️⃣ 一句话总结

这篇论文提出了首个专门用于评估大型音频-语言模型理解音频双关语能力的基准测试APUN-Bench，通过系统测试发现现有模型在识别、定位和解释音频双关语方面存在显著不足，为提升AI对幽默语音的理解提供了关键洞见。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要