风格化ViT:用于领域泛化的解剖结构保持实例风格迁移 / Stylizing ViT: Anatomy-Preserving Instance Style Transfer for Domain Generalization
1️⃣ 一句话总结
这篇论文提出了一种基于视觉Transformer的新方法,它能在医学图像中保持关键解剖结构不变的前提下,智能地变换图像风格,从而生成大量逼真、无伪影的多样化训练数据,有效提升了模型在面对不同医院、设备或人群数据时的泛化能力和诊断准确性。
Deep learning models in medical image analysis often struggle with generalizability across domains and demographic groups due to data heterogeneity and scarcity. Traditional augmentation improves robustness, but fails under substantial domain shifts. Recent advances in stylistic augmentation enhance domain generalization by varying image styles but fall short in terms of style diversity or by introducing artifacts into the generated images. To address these limitations, we propose Stylizing ViT, a novel Vision Transformer encoder that utilizes weight-shared attention blocks for both self- and cross-attention. This design allows the same attention block to maintain anatomical consistency through self-attention while performing style transfer via cross-attention. We assess the effectiveness of our method for domain generalization by employing it for data augmentation on three distinct image classification tasks in the context of histopathology and dermatology. Results demonstrate an improved robustness (up to +13% accuracy) over the state of the art while generating perceptually convincing images without artifacts. Additionally, we show that Stylizing ViT is effective beyond training, achieving a 17% performance improvement during inference when used for test-time augmentation. The source code is available at this https URL .
风格化ViT:用于领域泛化的解剖结构保持实例风格迁移 / Stylizing ViT: Anatomy-Preserving Instance Style Transfer for Domain Generalization
这篇论文提出了一种基于视觉Transformer的新方法,它能在医学图像中保持关键解剖结构不变的前提下,智能地变换图像风格,从而生成大量逼真、无伪影的多样化训练数据,有效提升了模型在面对不同医院、设备或人群数据时的泛化能力和诊断准确性。
源自 arXiv: 2601.17586