Unlocking In-Context Learning in Audio-Language Models from Decentralized Medical Audio

📄 Abstract - Unlocking In-Context Learning in Audio-Language Models from Decentralized Medical Audio

Clinical audio diagnosis in low-resource settings requires models that identify conditions from minimal examples without large annotated corpora. We propose Federated Self-Contextualization (FSC), a multimodal language model framework for in-context clinical audio diagnosis across federated hospital clients. FSC constructs pseudo-label episodes via unsupervised clustering of audio representations, bypassing scarce real diagnostic labels, and enables contextual reasoning from support-query pairs. Our progressive three-stage pipeline first aligns audio embeddings with the language model via caption-based pretraining, then adapts it for episodic in-context inference through federated optimization. At test time, given a small labeled support set, the model diagnoses an unseen query through multimodal reasoning. On held-out respiratory and cardiac conditions, FSC achieves 71.6% accuracy in 2-way 2-shot evaluation, outperforming audio-language baselines by over 9%.

从分散的医疗音频中解锁音频语言模型的上下文学习能力 / Unlocking In-Context Learning in Audio-Language Models from Decentralized Medical Audio

1️⃣ 一句话总结

本文提出了一种名为联邦自上下文化（FSC）的多模态语言模型框架，能够在无需大规模标注数据的情况下，利用分散在不同医院的少量音频样本（如呼吸音和心音）进行临床诊断，通过无监督聚类和联邦学习实现上下文学习，在少样本场景下准确率超过71%，比现有方法高9%以上。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要