Towards Resolving Optimization Conflicts Between Image- and Text-Based Person Re-Identification

📄 Abstract - Towards Resolving Optimization Conflicts Between Image- and Text-Based Person Re-Identification

The joint optimization of image-based (I2I) and text-based (T2I) person re-identification (ReID) is hindered by modality discrepancies and conflicting training objectives, leading to suboptimal shared representations. While I2I ReID focuses on identity-level invariance across images of the same person, T2I ReID is driven by instance-specific textual descriptions tied to unique visual traits. This paper explores the fundamental difference between two ReID tasks and their optimization processes for effective training. Since I2I and T2I ReID are often studied separately, the loss functions optimized for one retrieval setting may negatively affect the representation quality required by the other. Motivated by these findings, we propose a decoupled two-stage training pipeline for learning a shared representation across image and text modalities. The pipeline is based on a single vision encoder that supports both I2I and T2I retrieval while avoiding cross-task interference during training. We provide extensive experiments across multiple configurations, varying domain mixing procedures, learning strategies, and task objectives. We observed that I2I ReID pre-training positively impacts the generalization ability to T2I data. Besides, we find that incorporating textual supervision during the vision encoder training stage enhances both I2I and T2I performance. We believe our insights provide a meaningful step toward unified ReID systems and cross-modal retrieval overall.

解决图像与文本行人重识别之间的优化冲突 / Towards Resolving Optimization Conflicts Between Image- and Text-Based Person Re-Identification

1️⃣ 一句话总结

本文发现图像行人重识别和文本行人重识别在联合训练时存在目标冲突，导致共享表征效果不佳，为此提出了一种分阶段训练方法——先用图像任务预训练视觉编码器，再融入文本监督，从而在不干扰彼此的情况下提升两种任务的表现。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要