菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-01-29
📄 Abstract - VideoAesBench: Benchmarking the Video Aesthetics Perception Capabilities of Large Multimodal Models

Large multimodal models (LMMs) have demonstrated outstanding capabilities in various visual perception tasks, which has in turn made the evaluation of LMMs significant. However, the capability of video aesthetic quality assessment, which is a fundamental ability for human, remains underexplored for LMMs. To address this, we introduce VideoAesBench, a comprehensive benchmark for evaluating LMMs' understanding of video aesthetic quality. VideoAesBench has several significant characteristics: (1) Diverse content including 1,804 videos from multiple video sources including user-generated (UGC), AI-generated (AIGC), compressed, robotic-generated (RGC), and game videos. (2) Multiple question formats containing traditional single-choice questions, multi-choice questions, True or False questions, and a novel open-ended questions for video aesthetics description. (3) Holistic video aesthetics dimensions including visual form related questions from 5 aspects, visual style related questions from 4 aspects, and visual affectiveness questions from 3 aspects. Based on VideoAesBench, we benchmark 23 open-source and commercial large multimodal models. Our findings show that current LMMs only contain basic video aesthetics perception ability, their performance remains incomplete and imprecise. We hope our VideoAesBench can be served as a strong testbed and offer insights for explainable video aesthetics assessment.

顶级标签: multi-modal model evaluation benchmark
详细标签: video aesthetics large multimodal models quality assessment evaluation benchmark aesthetic perception 或 搜索:

VideoAesBench:评测大型多模态模型的视频美学感知能力 / VideoAesBench: Benchmarking the Video Aesthetics Perception Capabilities of Large Multimodal Models


1️⃣ 一句话总结

这篇论文提出了一个名为VideoAesBench的综合评测基准,用于系统评估当前各类大型多模态模型在理解视频美学质量方面的能力,发现它们目前仅具备基础且不完善的感知水平。

源自 arXiv: 2601.21915