Gen2Balance:面向长尾视频动作识别的生成式平衡方法 / Gen2Balance: Generative Balancing for Long-Tailed Video Action Recognition
1️⃣ 一句话总结
本文提出Gen2Balance方法,利用文本生成视频的模型自动补充稀缺类别的训练样本,将不均衡的数据集转化为真实与生成视频的平衡组合,从而显著提升长尾视频动作识别的准确率,尤其在稀少动作上效果突出,且通过部分平衡能以较低的计算成本实现大部分性能提升。
We address the problem of training on long-tailed data for video action recognition. We propose to augment the training set using a text-to-video generative model, conditioned on diverse text prompts grounded in action profiles and training exemplars. Our approach, called Gen2Balance, converts an imbalanced training set into a balanced combination of real and generated video clips. To effectively learn from such data, we employ a two-stage training strategy that mitigates domain shift and yields significant improvements. We evaluate on long-tailed versions of standard benchmarks: UCF-101 (UCF-LT) and a 100-class subset of Kinetics (K100-LT) selected to prioritise temporally challenging actions. Gen2Balance improves accuracy over the strongest baselines for long-tailed learning by 5.1% and 7.0% on the respective datasets. On rare actions from the RareAct dataset (e.g., cut keyboard), Gen2Balance improves accuracy by 31.9%, demonstrating effectiveness for scarce actions. By varying the amount of synthetic data added, we show that partial balancing already achieves 79% of the performance gains at 27% of the compute cost on K100-LT, highlighting the practical scalability of Gen2Balance.
Gen2Balance:面向长尾视频动作识别的生成式平衡方法 / Gen2Balance: Generative Balancing for Long-Tailed Video Action Recognition
本文提出Gen2Balance方法,利用文本生成视频的模型自动补充稀缺类别的训练样本,将不均衡的数据集转化为真实与生成视频的平衡组合,从而显著提升长尾视频动作识别的准确率,尤其在稀少动作上效果突出,且通过部分平衡能以较低的计算成本实现大部分性能提升。
源自 arXiv: 2606.22416