Demo-ICL: In-Context Learning for Procedural Video Knowledge Acquisition

📄 Abstract - Demo-ICL: In-Context Learning for Procedural Video Knowledge Acquisition

Despite the growing video understanding capabilities of recent Multimodal Large Language Models (MLLMs), existing video benchmarks primarily assess understanding based on models' static, internal knowledge, rather than their ability to learn and adapt from dynamic, novel contexts from few examples. To bridge this gap, we present Demo-driven Video In-Context Learning, a novel task focused on learning from in-context demonstrations to answer questions about the target videos. Alongside this, we propose Demo-ICL-Bench, a challenging benchmark designed to evaluate demo-driven video in-context learning capabilities. Demo-ICL-Bench is constructed from 1200 instructional YouTube videos with associated questions, from which two types of demonstrations are derived: (i) summarizing video subtitles for text demonstration; and (ii) corresponding instructional videos as video demonstrations. To effectively tackle this new challenge, we develop Demo-ICL, an MLLM with a two-stage training strategy: video-supervised fine-tuning and information-assisted direct preference optimization, jointly enhancing the model's ability to learn from in-context examples. Extensive experiments with state-of-the-art MLLMs confirm the difficulty of Demo-ICL-Bench, demonstrate the effectiveness of Demo-ICL, and thereby unveil future research directions.

Demo-ICL：基于演示的上下文学习用于过程性视频知识获取 / Demo-ICL: In-Context Learning for Procedural Video Knowledge Acquisition

1️⃣ 一句话总结

这篇论文提出了一个名为Demo-ICL的新任务和对应评测基准，旨在让多模态大语言模型能够通过观看少量教学视频演示来快速学习新技能并回答相关问题，同时开发了一个两阶段训练的新模型来有效解决这一挑战。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要