相机轴上的方差缩减:多视角分数蒸馏用于3D生成 / Variance Reduction on the Camera Axis: Multi-View Score Distillation for 3D
1️⃣ 一句话总结
本文提出了一种无需重新训练或使用多视角数据的方法,通过在一个固定的UNet计算预算内,对多个随机视角的梯度进行聚合(如成对选取视角),有效降低了2D扩散模型蒸馏为3D生成器时的梯度方差,从而显著提升生成质量、对齐得分并减少优化步骤。
Score distillation turns a pretrained 2D diffusion model into a 3D generator, but the per-step gradient is estimated from a single randomly chosen view: it is high-variance and blind to global shape consistency. Prior work addresses this by retraining the diffusion prior on multi-view data; this improves consistency but makes the sampling contribution inseparable from prior quality. We instead isolate the sampling axis. The per-step gradient is one noisy sample of an expectation over views; aggregating K samples per step at a fixed total UNet budget reduces variance without touching the prior. We introduce Multi-View Aggregated Score Distillation (MV-SDI), which aggregates gradients from K views per step via gradient accumulation, keeping peak memory unchanged and the 2D prior frozen, and draws views as antithetic antipodal pairs, a prior-independent geometric property, for balanced angular coverage. At a fixed 10,000-UNet-call budget, K=2 raises CLIP R-Precision from 74.8% to 83.8% and CLIP score from 0.297 to 0.312, with consistent gains on HPSv2 and ImageReward and a 0.0% divergence rate on the 43-prompt benchmark; optimization steps halve as a consequence. K=4 gives a fourfold step reduction at R-Precision 86.9% and CLIP 0.307, still well above the single-view baseline on every alignment metric. MV-SDI is compatible with gradient-based score-distillation pipelines, including Score Distillation via Inversion, and requires no retraining and no multi-view data.
相机轴上的方差缩减:多视角分数蒸馏用于3D生成 / Variance Reduction on the Camera Axis: Multi-View Score Distillation for 3D
本文提出了一种无需重新训练或使用多视角数据的方法,通过在一个固定的UNet计算预算内,对多个随机视角的梯度进行聚合(如成对选取视角),有效降低了2D扩散模型蒸馏为3D生成器时的梯度方差,从而显著提升生成质量、对齐得分并减少优化步骤。
源自 arXiv: 2606.29964