菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-06-22
📄 Abstract - PIVOTSBench: Evaluating Fine-Grained Interpersonal Relationship Reasoning in Multimodal Large Language Models

Humans possess an innate ability to understand fine-grained interpersonal relationships, which is central to everyday social interactions. Although such reasoning is inherently multimodal, it remains largely unexplored by existing multimodal large language models (MLLMs). To address this gap, we introduce PIVOTS, the first benchmark built from Social-IQ 2.0 and YouTube data to evaluate MLLMs' ability to predict bidirectional interpersonal relationship dimensions grounded in established psychology research. In addition, PIVOTS includes auxiliary tasks that assess models' ability to identify and leverage the critical visual cues underlying such predictions. We evaluate both proprietary and open-source MLLMs and conduct detailed ablation studies to analyze the effects of visual modalities and explicit social role information in conversational utterances. We further examine how joint and pairwise prediction settings benefit MLLMs in scoring bidirectional PIVOTS dimensions. Project page and resources: this https URL .

顶级标签: multi-modal benchmark model evaluation
详细标签: interpersonal relationship social reasoning visual cues multimodal llm psychology 或 搜索:

PIVOTSBench:评估多模态大语言模型在细粒度人际关系推理中的能力 / PIVOTSBench: Evaluating Fine-Grained Interpersonal Relationship Reasoning in Multimodal Large Language Models


1️⃣ 一句话总结

该论文提出了PIVOTS基准测试,通过整合视频、对话和心理学维度,首次系统评估多模态大语言模型在双向、细微人际关系判断上的表现,并分析了视觉线索和社交角色对推理的影响。

源自 arXiv: 2606.23092