菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-11
📄 Abstract - GENIUS: Generative Fluid Intelligence Evaluation Suite

Unified Multimodal Models (UMMs) have shown remarkable progress in visual generation. Yet, existing benchmarks predominantly assess $\textit{Crystallized Intelligence}$, which relies on recalling accumulated knowledge and learned schemas. This focus overlooks $\textit{Generative Fluid Intelligence (GFI)}$: the capacity to induce patterns, reason through constraints, and adapt to novel scenarios on the fly. To rigorously assess this capability, we introduce $\textbf{GENIUS}$ ($\textbf{GEN}$ Fluid $\textbf{I}$ntelligence Eval$\textbf{U}$ation $\textbf{S}$uite). We formalize $\textit{GFI}$ as a synthesis of three primitives. These include $\textit{Inducing Implicit Patterns}$ (e.g., inferring personalized visual preferences), $\textit{Executing Ad-hoc Constraints}$ (e.g., visualizing abstract metaphors), and $\textit{Adapting to Contextual Knowledge}$ (e.g., simulating counter-intuitive physics). Collectively, these primitives challenge models to solve problems grounded entirely in the immediate context. Our systematic evaluation of 12 representative models reveals significant performance deficits in these tasks. Crucially, our diagnostic analysis disentangles these failure modes. It demonstrates that deficits stem from limited context comprehension rather than insufficient intrinsic generative capability. To bridge this gap, we propose a training-free attention intervention strategy. Ultimately, $\textbf{GENIUS}$ establishes a rigorous standard for $\textit{GFI}$, guiding the field beyond knowledge utilization toward dynamic, general-purpose reasoning. Our dataset and code will be released at: $\href{this https URL}{this https URL}$.

顶级标签: multi-modal model evaluation benchmark
详细标签: fluid intelligence visual generation context comprehension evaluation suite multimodal reasoning 或 搜索:

GENIUS:生成式流体智能评估套件 / GENIUS: Generative Fluid Intelligence Evaluation Suite


1️⃣ 一句话总结

这篇论文提出了一个名为GENIUS的新评估标准,旨在测试AI模型在遇到全新、未见过的情境时,能否像人类一样灵活推理、归纳规律并创造新内容,而不仅仅是依赖已有的知识库,结果发现当前主流模型在这方面的能力仍有明显不足。

源自 arXiv: 2602.11144