菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-27
📄 Abstract - Omni-o3: Deep Nested Omnimodal Deduction for Deliberative Audio-Visual Reasoning

Omnimodal understanding entails a massive, highly redundant search space of cross-modal interactions, demanding focused and deliberative reasoning. Current reasoning paradigms rely on either sequential step-by-step generation or parallel sample-by-sample rollouts, leading to isolated reasoning trajectories. This inability to share promising intermediate paths severely limits exploration efficiency and causes compounding errors in complex audio-visual tasks. To break this bottleneck, we introduce Omni-o3, a novel framework driven by a deep nested deduction policy. By formulating reasoning as a dynamic recursive search, Omni-o3 inherently shares reasoning prefixes across branches, enabling the iterative execution of four atomic cognitive actions: expansion, selection, simulation, and backpropagation. To empower this framework, we propose a robust two-stage training paradigm: (1) cold-start supervised fine-tuning on 101K high-quality, long-chain trajectories distilled from 3.5M diverse omnimodal samples, enabling necessary recursive search patterns; and (2) nested group rollout-driven exploratory reinforcement learning on 18K complex multi-turn samples, explicitly guided by a novel multi-step reward model to stimulate deep nested reasoning. Extensive experiments demonstrate that Omni-o3 achieves competitive performance across 11 benchmarks, unlocking advanced capabilities in comprehensive audio-visual, visual-centric, and audio-centric reasoning tasks.

顶级标签: multi-modal model training reinforcement learning
详细标签: audio-visual reasoning recursive search deduction policy reward model omnimodal understanding 或 搜索:

Omni-o3:面向审慎音视频推理的深度嵌套全模态演绎框架 / Omni-o3: Deep Nested Omnimodal Deduction for Deliberative Audio-Visual Reasoning


1️⃣ 一句话总结

本文提出了一种名为Omni-o3的新型人工智能框架,它通过将推理过程模拟为动态递归搜索,并引入“扩展、选择、模拟、反向传播”四种认知操作,结合两阶段训练策略,显著提升了模型在复杂音视频任务中处理信息冗余、避免推理错误的能力,在11个基准测试上取得了领先性能。

源自 arXiv: 2604.24191