菜单

🤖 系统
📄 Abstract - The Collapse of Patches

Observing certain patches in an image reduces the uncertainty of others. Their realization lowers the distribution entropy of each remaining patch feature, analogous to collapsing a particle's wave function in quantum mechanics. This phenomenon can intuitively be called patch collapse. To identify which patches are most relied on during a target region's collapse, we learn an autoencoder that softly selects a subset of patches to reconstruct each target patch. Graphing these learned dependencies for each patch's PageRank score reveals the optimal patch order to realize an image. We show that respecting this order benefits various masked image modeling methods. First, autoregressive image generation can be boosted by retraining the state-of-the-art model MAR. Next, we introduce a new setup for image classification by exposing Vision Transformers only to high-rank patches in the collapse order. Seeing 22\% of such patches is sufficient to achieve high accuracy. With these experiments, we propose patch collapse as a novel image modeling perspective that promotes vision efficiency. Our project is available at this https URL .

顶级标签: computer vision model training machine learning
详细标签: patch collapse vision transformers masked image modeling autoregressive generation image reconstruction 或 搜索:

图像块坍缩 / The Collapse of Patches


1️⃣ 一句话总结

这篇论文提出了一种名为‘图像块坍缩’的新视角,通过分析图像中不同区域之间的相互依赖关系来确定一个最优的观察顺序,从而让计算机在只看到部分图像的情况下就能高效地完成图像生成和识别任务。


📄 打开原文 PDF