菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-06-21
📄 Abstract - Gen2Balance: Generative Balancing for Long-Tailed Video Action Recognition

We address the problem of training on long-tailed data for video action recognition. We propose to augment the training set using a text-to-video generative model, conditioned on diverse text prompts grounded in action profiles and training exemplars. Our approach, called Gen2Balance, converts an imbalanced training set into a balanced combination of real and generated video clips. To effectively learn from such data, we employ a two-stage training strategy that mitigates domain shift and yields significant improvements. We evaluate on long-tailed versions of standard benchmarks: UCF-101 (UCF-LT) and a 100-class subset of Kinetics (K100-LT) selected to prioritise temporally challenging actions. Gen2Balance improves accuracy over the strongest baselines for long-tailed learning by 5.1% and 7.0% on the respective datasets. On rare actions from the RareAct dataset (e.g., cut keyboard), Gen2Balance improves accuracy by 31.9%, demonstrating effectiveness for scarce actions. By varying the amount of synthetic data added, we show that partial balancing already achieves 79% of the performance gains at 27% of the compute cost on K100-LT, highlighting the practical scalability of Gen2Balance.

顶级标签: video aigc machine learning
详细标签: long-tailed recognition video action recognition text-to-video generation data augmentation imbalanced learning 或 搜索:

Gen2Balance:面向长尾视频动作识别的生成式平衡方法 / Gen2Balance: Generative Balancing for Long-Tailed Video Action Recognition


1️⃣ 一句话总结

本文提出Gen2Balance方法,利用文本生成视频的模型自动补充稀缺类别的训练样本,将不均衡的数据集转化为真实与生成视频的平衡组合,从而显著提升长尾视频动作识别的准确率,尤其在稀少动作上效果突出,且通过部分平衡能以较低的计算成本实现大部分性能提升。

源自 arXiv: 2606.22416