基于多格式训练提升语言模型的跨格式鲁棒性 / Improving Cross-Format Robustness in Language Models with Multi-Format Training
1️⃣ 一句话总结
本文提出并验证了一种轻量级的多格式训练方法(FormatMix),通过将约30%的训练数据扩展为多种语义等价的答案格式,可以显著提升大语言模型在应对不同问题格式时的一致性和任务表现,避免因格式变化导致回答失败。
Large language models often remain sensitive to answer format: a question solved correctly in one form may fail in another semantically equivalent form. To study this gap, we define cross-format robustness as the extent to which a model answers the same underlying question consistently across formats. We then compare full-format training with FormatMix, which expands only a subset of training items into multiple equivalent formats using either random or targeted selection. Across GLM4 and Llama-3.1, multi-format supervision consistently improves both task performance and cross-format robustness, whereas Multiple-choice question (MCQ)-only supervision alone brings little benefit and can even reduce robustness. We further find that expanding only about 30% of the training set into multiple formats often recovers most of the gain from full-format training, and this effect appears across the model families and sizes we study. These results suggest that format diversity, rather than additional supervision alone, is the key driver of robustness. That lightweight multi-format augmentation is a practical way to make LLMs less sensitive to answer format without changing the base model.
基于多格式训练提升语言模型的跨格式鲁棒性 / Improving Cross-Format Robustness in Language Models with Multi-Format Training
本文提出并验证了一种轻量级的多格式训练方法(FormatMix),通过将约30%的训练数据扩展为多种语义等价的答案格式,可以显著提升大语言模型在应对不同问题格式时的一致性和任务表现,避免因格式变化导致回答失败。
源自 arXiv: 2606.11643