PaW-ViT:一种基于补丁形变的视觉Transformer,用于鲁棒的耳部验证 / PaW-ViT: A Patch-based Warping Vision Transformer for Robust Ear Verification
1️⃣ 一句话总结
这篇论文提出了一种名为PaW-ViT的新方法,它通过基于解剖学知识对耳朵图像进行预处理和形变对齐,使视觉Transformer模型能更稳定、准确地识别不同形状、大小和姿态的耳朵,从而提升了耳部生物识别的鲁棒性。
The rectangular tokens common to vision transformer methods for visual recognition can strongly affect performance of these methods due to incorporation of information outside the objects to be recognized. This paper introduces PaW-ViT, Patch-based Warping Vision Transformer, a preprocessing approach rooted in anatomical knowledge that normalizes ear images to enhance the efficacy of ViT. By accurately aligning token boundaries to detected ear feature boundaries, PaW-ViT obtains greater robustness to shape, size, and pose variation. By aligning feature boundaries to natural ear curvature, it produces more consistent token representations for various morphologies. Experiments confirm the effectiveness of PaW-ViT on various ViT models (ViT-T, ViT-S, ViT-B, ViT-L) and yield reasonable alignment robustness to variation in shape, size, and pose. Our work aims to solve the disconnect between ear biometric morphological variation and transformer architecture positional sensitivity, presenting a possible avenue for authentication schemes.
PaW-ViT:一种基于补丁形变的视觉Transformer,用于鲁棒的耳部验证 / PaW-ViT: A Patch-based Warping Vision Transformer for Robust Ear Verification
这篇论文提出了一种名为PaW-ViT的新方法,它通过基于解剖学知识对耳朵图像进行预处理和形变对齐,使视觉Transformer模型能更稳定、准确地识别不同形状、大小和姿态的耳朵,从而提升了耳部生物识别的鲁棒性。
源自 arXiv: 2601.19771