Gen2Balance: Generative Balancing for Long-Tailed Video Action Recognition

📄 Abstract - Gen2Balance: Generative Balancing for Long-Tailed Video Action Recognition

We address the problem of training on long-tailed data for video action recognition. We propose to augment the training set using a text-to-video generative model, conditioned on diverse text prompts grounded in action profiles and training exemplars. Our approach, called Gen2Balance, converts an imbalanced training set into a balanced combination of real and generated video clips. To effectively learn from such data, we employ a two-stage training strategy that mitigates domain shift and yields significant improvements. We evaluate on long-tailed versions of standard benchmarks: UCF-101 (UCF-LT) and a 100-class subset of Kinetics (K100-LT) selected to prioritise temporally challenging actions. Gen2Balance improves accuracy over the strongest baselines for long-tailed learning by 5.1% and 7.0% on the respective datasets. On rare actions from the RareAct dataset (e.g., cut keyboard), Gen2Balance improves accuracy by 31.9%, demonstrating effectiveness for scarce actions. By varying the amount of synthetic data added, we show that partial balancing already achieves 79% of the performance gains at 27% of the compute cost on K100-LT, highlighting the practical scalability of Gen2Balance.

Gen2Balance：面向长尾视频动作识别的生成式平衡方法 / Gen2Balance: Generative Balancing for Long-Tailed Video Action Recognition

1️⃣ 一句话总结

本文提出Gen2Balance方法，利用文本生成视频的模型自动补充稀缺类别的训练样本，将不均衡的数据集转化为真实与生成视频的平衡组合，从而显著提升长尾视频动作识别的准确率，尤其在稀少动作上效果突出，且通过部分平衡能以较低的计算成本实现大部分性能提升。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要