KidRisk:儿童危险动作识别的基准数据集 / KidRisk: Benchmark Dataset for Children Dangerous Action Recognition
1️⃣ 一句话总结
该论文构建了一个包含2500个短视频和10000张图片的儿童危险动作数据集KidRisk,并提出了基于视觉语言模型的方法,在儿童动作分类和危险动作识别上分别达到了83.53%和96.14%的准确率,为儿童安全监测提供了高效可行的技术方案。
Children are naturally energetic, and during their spontaneous activities, they often encounter potentially dangerous situations, especially when lacking parental supervision. Identifying actions that pose risks plays a crucial role in ensuring their safety. This paper build a novel challenging dataset, namely KidRisk, including 2,500 short videos of children's actions and 10,000 images for dangerous action of children. We also introduce a benchmark on our newly constructs dataset and find that traditional deep learning models demonstrated limited effectiveness on these tasks. Therefore, we develop vision-language based baselines with exceptional context understanding of visual information. Our proposed methods achieved an accuracy of 83.53% in classifying children's actions and 96.14% in recognizing children's dangerous actions, significantly outperforming traditional approaches. These results confirm that vision-language models are not only feasible but also highly effective in detecting hazardous actions, contributing positively to safeguarding children's safety.
KidRisk:儿童危险动作识别的基准数据集 / KidRisk: Benchmark Dataset for Children Dangerous Action Recognition
该论文构建了一个包含2500个短视频和10000张图片的儿童危险动作数据集KidRisk,并提出了基于视觉语言模型的方法,在儿童动作分类和危险动作识别上分别达到了83.53%和96.14%的准确率,为儿童安全监测提供了高效可行的技术方案。
源自 arXiv: 2606.25298