菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-06-03
📄 Abstract - Beyond Symmetric Alignment: Spectral Diagnostics of Modality Imbalance in Vision-Language Models in the Medical Domain

Vision-Language Models (VLMs) struggle when applied to medical image-text data, yet the tools available to diagnose this failure remain limited. Existing representation alignment metrics are symmetric, collapsing both modalities into a single score and hiding which modality drives cross-modal degradation. We introduce the Spectral Alignment Score (SAS), an asymmetric metric that projects both modalities onto the principal eigenbasis of an anchor modality and computes eigenvalue-weighted per-eigenmode correlations, resulting in directional scores whose difference quantifies modality information imbalance. We embed SAS within a benchmarking framework evaluating 15 VLMs across natural and medical image-text datasets alongside 6 alignment metrics and bidirectional retrieval. Our experiments show that medical images retain richer structural information than their paired clinical reports, a directional asymmetry invisible to all competing metrics, and that SAS achieves the strongest zero-label correlation with retrieval performance in the medical domain, positioning it as a practical diagnostic tool for clinical deployment. Code is available at this URL: this https URL.

顶级标签: multi-modal medical model evaluation
详细标签: vision-language models modality imbalance alignment metric spectral analysis benchmark 或 搜索:

超越对称对齐:医学领域视觉-语言模型中模态失衡的光谱诊断方法 / Beyond Symmetric Alignment: Spectral Diagnostics of Modality Imbalance in Vision-Language Models in the Medical Domain


1️⃣ 一句话总结

本文提出了一种名为光谱对齐分数(SAS)的非对称评估指标,通过分析图像和文本在主导模态特征空间中的相关性差异,揭示了医学视觉-语言模型中图像信息远丰富于对应文本报告这一关键问题,且该指标无需标注数据就能准确预测模型在医学检索任务中的表现。

源自 arXiv: 2606.04613