📄
Abstract - Grappa: Gradient-Only Communication for Scalable Graph Neural Network Training
Cross-partition edges dominate the cost of distributed GNN training: fetching remote features and activations per iteration overwhelms the network as graphs deepen and partition counts grow. Grappa is a distributed GNN training framework that enforces gradient-only communication: during each iteration, partitions train in isolation and exchange only gradients for the global update. To recover accuracy lost to isolation, Grappa (i) periodically repartitions to expose new neighborhoods and (ii) applies a lightweight coverage-corrected gradient aggregation inspired by importance sampling. We prove the corrected estimator is asymptotically unbiased under standard support and boundedness assumptions, and we derive a batch-level variant for compatibility with common deep-learning packages that minimizes mean-squared deviation from the ideal node-level correction. We also introduce a shrinkage version that improves stability in practice. Empirical results on real and synthetic graphs show that Grappa trains GNNs 4 times faster on average (up to 13 times) than state-of-the-art systems, achieves better accuracy especially for deeper models, and sustains training at the trillion-edge scale on commodity hardware. Grappa is model-agnostic, supports full-graph and mini-batch training, and does not rely on high-bandwidth interconnects or caching.
Grappa:一种仅通过梯度通信实现可扩展图神经网络训练的框架 /
Grappa: Gradient-Only Communication for Scalable Graph Neural Network Training
1️⃣ 一句话总结
这篇论文提出了一种名为Grappa的新型分布式图神经网络训练框架,它通过让各个分区独立训练并仅交换梯度来大幅减少网络通信开销,同时结合周期性图重分区和一种轻量级的梯度校正方法来保证模型精度,从而实现了比现有系统快数倍且能处理万亿级别超大规模图的高效训练。