MODF-SIR:一种用于社交智能推理的多智能体全模态蒸馏框架 / MODF-SIR: A Multi-agent Omni-modal Distilled Framework for Social Intelligence Reasoning
1️⃣ 一句话总结
本文提出了一种轻量级多智能体协作框架,通过知识蒸馏和测试时自适应技术,在社交智能推理中高效提取并利用长尾事件信息,仅用30%的训练数据就达到了行业领先水平。
We propose a multi-agent collaborative framework built upon a lightweight Multimodal Large Language Model (MLLM), specifically designed for social intelligence reasoning. A key feature of our approach is that both the training and inference phases are augmented via knowledge distillation. Within this architecture, multi-modal data pertinent to social intelligence is precisely localized. Furthermore, relevant long-tail events are identified, extracted, and rendered as formatted, explicit text. This formatting strategy prevents critical long-tail information from being overshadowed by head events and environmental noise during the tokenization process. Specifically, we integrate Test-Time Adaptation (TTA) across the entire reasoning pipeline, encompassing the extraction and representation of long-tail events, Chain-of-Thought (CoT) prompting, and self-reflection. This TTA mechanism is also distillation-enhanced, utilizing Low-Rank Adaptation (LoRA) to fine-tune the foundation model exclusively for instance-level reasoning. Extensive evaluations against various open-source and proprietary AI models across multiple benchmarks demonstrate the effectiveness of the proposed framework. With around 30% of training data from IntentTrain, we achieve state-of-the-art results. Codes are available at this https URL, demo is available at this https URL, LoRA is available at this https URL and the dataset for training router is available at this https URL.
MODF-SIR:一种用于社交智能推理的多智能体全模态蒸馏框架 / MODF-SIR: A Multi-agent Omni-modal Distilled Framework for Social Intelligence Reasoning
本文提出了一种轻量级多智能体协作框架,通过知识蒸馏和测试时自适应技术,在社交智能推理中高效提取并利用长尾事件信息,仅用30%的训练数据就达到了行业领先水平。
源自 arXiv: 2606.12018