谜语探索:文字之谜 / Riddle Quest : The Enigma of Words
1️⃣ 一句话总结
这篇论文设计了一个自动生成和评估类比谜语的系统,并用它来测试大型语言模型能否找出谜语的所有可能答案,结果发现模型虽然能猜到主要答案,但常常忽略其他合理的解释,从而揭示了谜语可以作为评估AI模型推理全面性和处理歧义能力的有效工具。
Riddles are concise linguistic puzzles that describe an object or idea through indirect, figurative, or playful clues. They are a longstanding form of creative expression, requiring the solver to interpret hints, recognize patterns, and draw inferences to identify the answers. In this work, we introduce a simple pipeline for creating and evaluating analogy-based riddles. The system includes a triples creator that builds structured facts about a concept, a semantic mapper that selects attributes useful for analogy, a stylized generator that turns them into riddle clues, and a validator that collects all possible answers the riddle could point to. We use this validator to study whether large language models can recover the full answer set for different riddle types. Our case study shows that while models often guess the main intended answer, they frequently miss other valid interpretations. This highlights the value of riddles as a lightweight tool for examining reasoning coverage and ambiguity handling in language models.
谜语探索:文字之谜 / Riddle Quest : The Enigma of Words
这篇论文设计了一个自动生成和评估类比谜语的系统,并用它来测试大型语言模型能否找出谜语的所有可能答案,结果发现模型虽然能猜到主要答案,但常常忽略其他合理的解释,从而揭示了谜语可以作为评估AI模型推理全面性和处理歧义能力的有效工具。
源自 arXiv: 2601.19273