📄
Abstract - SemVideo: Reconstructs What You Watch from Brain Activity via Hierarchical Semantic Guidance
Reconstructing dynamic visual experiences from brain activity provides a compelling avenue for exploring the neural mechanisms of human visual perception. While recent progress in fMRI-based image reconstruction has been notable, extending this success to video reconstruction remains a significant challenge. Current fMRI-to-video reconstruction approaches consistently encounter two major shortcomings: (i) inconsistent visual representations of salient objects across frames, leading to appearance mismatches; (ii) poor temporal coherence, resulting in motion misalignment or abrupt frame transitions. To address these limitations, we introduce SemVideo, a novel fMRI-to-video reconstruction framework guided by hierarchical semantic information. At the core of SemVideo is SemMiner, a hierarchical guidance module that constructs three levels of semantic cues from the original video stimulus: static anchor descriptions, motion-oriented narratives, and holistic summaries. Leveraging this semantic guidance, SemVideo comprises three key components: a Semantic Alignment Decoder that aligns fMRI signals with CLIP-style embeddings derived from SemMiner, a Motion Adaptation Decoder that reconstructs dynamic motion patterns using a novel tripartite attention fusion architecture, and a Conditional Video Render that leverages hierarchical semantic guidance for video reconstruction. Experiments conducted on the CC2017 and HCP datasets demonstrate that SemVideo achieves superior performance in both semantic alignment and temporal consistency, setting a new state-of-the-art in fMRI-to-video reconstruction.
SemVideo:通过分层语义引导从大脑活动中重建你所观看的内容 /
SemVideo: Reconstructs What You Watch from Brain Activity via Hierarchical Semantic Guidance
1️⃣ 一句话总结
这篇论文提出了一个名为SemVideo的新框架,它利用从原始视频中提取的多层次语义信息(如静态描述、运动叙事和整体摘要)来引导解码大脑活动信号,从而更准确、连贯地重建出人们观看的动态视频内容,解决了以往方法在物体外观一致性和时间连贯性上的不足。