菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-15
📄 Abstract - A Study of Failure Modes in Two-Stage Human-Object Interaction Detection

Human-object interaction (HOI) detection aims to detect interactions between humans and objects in images. While recent advances have improved performance on existing benchmarks, their evaluations mainly focus on overall prediction accuracy and provide limited insight into the underlying causes of model failures. In particular, modern models often struggle in complex scenes involving multiple people and rare interaction combinations. In this work, we present a study to better understand the failure modes of two-stage HOI models, which form the basis of many current HOI detection approaches. Rather than constructing a large-scale benchmark, we instead decompose HOI detection into multiple interpretable perspectives and analyze model behavior across these dimensions to study different types of failure patterns. We curate a subset of images from an existing HOI dataset organized by human-object-interaction configurations (e.g., multi-person interactions and object sharing), and analyze model behavior under these configurations to examine different failure modes. This design allows us to analyze how these HOI models behave under different scene compositions and why their predictions fail. Importantly, high overall benchmark performance does not necessarily reflect robust visual reasoning about human-object relationships. We hope that this study can provide useful insights into the limitations of HOI models and offer observations for future research in this area.

顶级标签: computer vision model evaluation benchmark
详细标签: human-object interaction failure analysis two-stage detection scene understanding model limitations 或 搜索:

两阶段人-物交互检测中的失败模式研究 / A Study of Failure Modes in Two-Stage Human-Object Interaction Detection


1️⃣ 一句话总结

这篇论文通过分析两阶段人-物交互检测模型在复杂场景(如多人互动和罕见交互组合)中的具体失败模式,揭示了高整体性能并不等同于模型具备稳健的视觉推理能力,为理解模型局限性和未来研究提供了新视角。

源自 arXiv: 2604.13448