基于梯度的多任务学习任务亲和力估计的信息论要求 / Information-Theoretic Requirements for Gradient-Based Task Affinity Estimation in Multi-Task Learning
1️⃣ 一句话总结
这篇论文发现,基于梯度的多任务学习分析必须要求任务间有足够多的共享训练样本(至少30%-40%的重叠),否则梯度信号会被噪声淹没,这首次从原理上解释了该领域长期存在的结果不一致问题。
Multi-task learning shows strikingly inconsistent results -- sometimes joint training helps substantially, sometimes it actively harms performance -- yet the field lacks a principled framework for predicting these outcomes. We identify a fundamental but unstated assumption underlying gradient-based task analysis: tasks must share training instances for gradient conflicts to reveal genuine relationships. When tasks are measured on the same inputs, gradient alignment reflects shared mechanistic structure; when measured on disjoint inputs, any apparent signal conflates task relationships with distributional shift. We discover this sample overlap requirement exhibits a sharp phase transition: below 30% overlap, gradient-task correlations are statistically indistinguishable from noise; above 40%, they reliably recover known biological structure. Comprehensive validation across multiple datasets achieves strong correlations and recovers biological pathway organization. Standard benchmarks systematically violate this requirement -- MoleculeNet operates at <5% overlap, TDC at 8-14% -- far below the threshold where gradient analysis becomes meaningful. This provides the first principled explanation for seven years of inconsistent MTL results.
基于梯度的多任务学习任务亲和力估计的信息论要求 / Information-Theoretic Requirements for Gradient-Based Task Affinity Estimation in Multi-Task Learning
这篇论文发现,基于梯度的多任务学习分析必须要求任务间有足够多的共享训练样本(至少30%-40%的重叠),否则梯度信号会被噪声淹没,这首次从原理上解释了该领域长期存在的结果不一致问题。
源自 arXiv: 2604.07848