📄
Abstract - Unlocking In-Context Learning in Audio-Language Models from Decentralized Medical Audio
Clinical audio diagnosis in low-resource settings requires models that identify conditions from minimal examples without large annotated corpora. We propose Federated Self-Contextualization (FSC), a multimodal language model framework for in-context clinical audio diagnosis across federated hospital clients. FSC constructs pseudo-label episodes via unsupervised clustering of audio representations, bypassing scarce real diagnostic labels, and enables contextual reasoning from support-query pairs. Our progressive three-stage pipeline first aligns audio embeddings with the language model via caption-based pretraining, then adapts it for episodic in-context inference through federated optimization. At test time, given a small labeled support set, the model diagnoses an unseen query through multimodal reasoning. On held-out respiratory and cardiac conditions, FSC achieves 71.6% accuracy in 2-way 2-shot evaluation, outperforming audio-language baselines by over 9%.
从分散的医疗音频中解锁音频语言模型的上下文学习能力 /
Unlocking In-Context Learning in Audio-Language Models from Decentralized Medical Audio
1️⃣ 一句话总结
本文提出了一种名为联邦自上下文化(FSC)的多模态语言模型框架,能够在无需大规模标注数据的情况下,利用分散在不同医院的少量音频样本(如呼吸音和心音)进行临床诊断,通过无监督聚类和联邦学习实现上下文学习,在少样本场景下准确率超过71%,比现有方法高9%以上。