ReHyAt: Recurrent Hybrid Attention for Video Diffusion Transformers

📄 Abstract - ReHyAt: Recurrent Hybrid Attention for Video Diffusion Transformers

Recent advances in video diffusion models have shifted towards transformer-based architectures, achieving state-of-the-art video generation but at the cost of quadratic attention complexity, which severely limits scalability for longer sequences. We introduce ReHyAt, a Recurrent Hybrid Attention mechanism that combines the fidelity of softmax attention with the efficiency of linear attention, enabling chunk-wise recurrent reformulation and constant memory usage. Unlike the concurrent linear-only SANA Video, ReHyAt's hybrid design allows efficient distillation from existing softmax-based models, reducing the training cost by two orders of magnitude to ~160 GPU hours, while being competitive in the quality. Our light-weight distillation and finetuning pipeline provides a recipe that can be applied to future state-of-the-art bidirectional softmax-based models. Experiments on VBench and VBench-2.0, as well as a human preference study, demonstrate that ReHyAt achieves state-of-the-art video quality while reducing attention cost from quadratic to linear, unlocking practical scalability for long-duration and on-device video generation. Project page is available at this https URL.

ReHyAt：用于视频扩散变换器的循环混合注意力机制 / ReHyAt: Recurrent Hybrid Attention for Video Diffusion Transformers

1️⃣ 一句话总结

这篇论文提出了一种名为ReHyAt的新型循环混合注意力机制，它巧妙地将高精度的传统注意力与高效率的线性注意力结合起来，在保持视频生成顶尖质量的同时，将计算成本从平方级大幅降低到线性级，从而让生成更长视频或在小设备上运行变得切实可行。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要